static threshold
Adaptive Monitoring and Real-World Evaluation of Agentic AI Systems
Agentic artificial intelligence (AI) -- multi-agent systems that combine large language models with external tools and autonomous planning -- are rapidly transitioning from research laboratories into high-stakes domains. Our earlier "Basic" paper introduced a five-axis framework and proposed preliminary metrics such as goal drift and harm reduction but did not provide an algorithmic instantiation or empirical evidence. This "Advanced" sequel fills that gap. First, we revisit recent benchmarks and industrial deployments to show that technical metrics still dominate evaluations: a systematic review of 84 papers from 2023--2025 found that 83% report capability metrics while only 30% consider human-centred or economic axes [2]. Second, we formalise an Adaptive Multi-Dimensional Monitoring (AMDM) algorithm that normalises heterogeneous metrics, applies per-axis exponentially weighted moving-average thresholds and performs joint anomaly detection via the Mahalanobis distance [7]. Third, we conduct simulations and real-world experiments. AMDM cuts anomaly-detection latency from 12.3 s to 5.6 s on simulated goal drift and reduces false-positive rates from 4.5% to 0.9% compared with static thresholds. We present a comparison table and ROC/PR curves, and we reanalyse case studies to surface missing metrics. Code, data and a reproducibility checklist accompany this paper to facilitate replication. The code supporting this work is available at https://github.com/Manishms18/Adaptive-Multi-Dimensional-Monitoring.
Adaptive Semantic Prompt Caching with VectorQ
Schroeder, Luis Gaspar, Liu, Shu, Cuadron, Alejandro, Zhao, Mark, Krusche, Stephan, Kemper, Alfons, Zaharia, Matei, Gonzalez, Joseph E.
Semantic prompt caches reduce the latency and cost of large language model (LLM) inference by reusing cached LLM-generated responses for semantically similar prompts. Vector similarity metrics assign a numerical score to quantify the similarity between an embedded prompt and its nearest neighbor in the cache. Existing systems rely on a static threshold to classify whether the similarity score is sufficiently high to result in a cache hit. We show that this one-size-fits-all threshold is insufficient across different prompts. We propose VectorQ, a framework to learn embedding-specific threshold regions that adapt to the complexity and uncertainty of an embedding. Through evaluations on a combination of four diverse datasets, we show that VectorQ consistently outperforms state-of-the-art systems across all static thresholds, achieving up to 12x increases in cache hit rate and error rate reductions up to 92%.
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
Why companies should use AI to fight cyberattacks
In any debate, there are always at least two sides. That reasoning also applies to whether or not it is a good idea to use artificial intelligence technology to try stemming the advantages of cybercriminals who are already using AI to improve their success ratio. In an email exchange, I asked Ramprakash Ramamoorthy, director of research at ManageEngine, a division of Zoho Corporation, for his thoughts on the matter. Ramamoorthy is firmly on the affirmative side for using AI to fight cybercrime. He said, "The only way to combat cybercriminals using AI-enhanced attacks is to fight fire with fire and employ AI countermeasures."
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.58)
Solar Radiation Anomaly Events Modeling Using Spatial-Temporal Mutually Interactive Processes
Zhang, Minghe, Xu, Chen, Sun, Andy, Qiu, Feng, Xie, Yao
Solar power installations are becoming common in residential and commercial areas, largely due to their decreasing costs. However, the power system is vulnerable to some anomalies such as rainstorm or hurricane, which cost greatly to restoration. As a result, detecting and predicting abnormal events from the spatialtemporal series plays a vital role in the solar system, aiming to capture the variety of intrinsic reasons for the anomalies. For example, the rainstorm and drought would bring out different types and patterns of anomalies. In many cases, the abnormal event will also start at one location and then propagate to its neighbors with a time delay, leading to spatial-temporal correlation among anomalies. Thus it is crucial to make observations at multiple locations, which correspondingly form the spatial-temporal series. In this paper, we address non-stationarity and strong spatial-temporal correlation through the following contributions: - Strong spatial-temporal correlation: We present a spatial-temporal Bernoulli process (also extended to categorical observations), which is proposed by [19]. The model can flexibly capture the spatial-temporal correlations and interactions without assuming time-decaying influence. It can also efficiently make predictions for any location at any future time for timely ramp event detection.
- North America > United States > California > Los Angeles County > Los Angeles (0.16)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
- (2 more...)
Deep Q-Network-based Adaptive Alert Threshold Selection Policy for Payment Fraud Systems in Retail Banking
Machine learning models have widely been used in fraud detection systems. Most of the research and development efforts have been concentrated on improving the performance of the fraud scoring models. Yet, the downstream fraud alert systems still have limited to no model adoption and rely on manual steps. Alert systems are pervasively used across all payment channels in retail banking and play an important role in the overall fraud detection process. Current fraud detection systems end up with large numbers of dropped alerts due to their inability to account for the alert processing capacity. Ideally, alert threshold selection enables the system to maximize the fraud detection while balancing the upstream fraud scores and the available bandwidth of the alert processing teams. However, in practice, fixed thresholds that are used for their simplicity do not have this ability. In this paper, we propose an enhanced threshold selection policy for fraud alert systems. The proposed approach formulates the threshold selection as a sequential decision making problem and uses Deep Q-Network based reinforcement learning. Experimental results show that this adaptive approach outperforms the current static solutions by reducing the fraud losses as well as improving the operational efficiency of the alert system.
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Hawaii (0.04)
- North America > United States > Alabama (0.04)
Fast and Accurate Inference with Adaptive Ensemble Prediction in Image Classification with Deep Neural Networks
Ensembling multiple predictions is a widely used technique to improve the accuracy of various machine learning tasks. In image classification tasks, for example, averaging the predictions for multiple patches extracted from the input image significantly improves accuracy. Using multiple networks trained independently to make predictions improves accuracy further. One obvious drawback of the ensembling technique is its higher execution cost during inference. If we average 100 predictions, the execution cost will be 100 times as high as the cost without the ensemble. This higher cost limits the real-world use of ensembling, even though using it is almost the norm to win image classification competitions. In this paper, we describe a new technique called adaptive ensemble prediction, which achieves the benefits of ensembling with much smaller additional execution costs. Our observation behind this technique is that many easy-to-predict inputs do not require ensembling. Hence we calculate the confidence level of the prediction for each input on the basis of the probability of the predicted label, i.e. the outputs from the softmax, during the ensembling computation. If the prediction for an input reaches a high enough probability on the basis of the confidence level, we stop ensembling for this input to avoid wasting computation power. We evaluated the adaptive ensembling by using various datasets and showed that it reduces the computation time significantly while achieving similar accuracy to the naive ensembling.
Optimizing Alerts on Free space on disks using Machine Learning - OpsClarity
The available space on the disk (diskfree) has a significant and often catastrophic impact on applications and services running on the system. For this reason, every DevOps engineer knows that it is crucial to carefully monitor disk usage in all critical systems, especially ones that tend to rapidly use up disk space, such as heavily used Hadoop stores, applications with extensive logging, Kafka clusters with a long retention period, etc. The most common monitors used for diskfree metrics rely on a static threshold where the threshold is set by a DevOps engineer with intricate knowledge of the system and applications running on the system. For example, a DevOps engineer may choose to set a static threshold at 5%, i.e., the monitor will trigger an alert if diskfree falls below 5%. In our experience, this approach is inefficient for several reasons.